Skip to content

Conversation

@allenss-amazon
Copy link
Member

@allenss-amazon allenss-amazon commented Oct 9, 2025

fixes #204

@allenss-amazon allenss-amazon changed the title Initial IndexName encode/decode Full support for DB numbers other than zero in cluster mode. Oct 14, 2025
@allenss-amazon
Copy link
Member Author

Fixes issue #204

@allenss-amazon allenss-amazon marked this pull request as ready for review October 14, 2025 00:33
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Comment on lines 41 to 42
The SchemaManager uses the MetadataManager to manage a single name space of
indexes, i.e., map<string, GlobalMetadataEntry>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the thought going into this. The interoperability is a headache

One question: when someone creates an index in db_num > 0 on Valkey 9, how would a shard running on Valkey 8 behave? I am guessing it would crash, right? I guess that behavior is fine, but we should document this

Another option I guess is that we have namespaces in the metadata manager. Right now we use "vs_index_schema". We could move to "vs_index_schema2"? Nodes on the old versions would just not get the index definitions for those created with vs_index_schema2, which should be fine, and we can register for callbacks of "vs_index_schema" in the new node for backwards compat. Probably a bit more work, but it would prevent the Valkey 8.0 nodes from crashing in such cases

@allenss-amazon
Copy link
Member Author

Revised to reject RDB or cross-shard metadata that contains any object with a semantic version higher than what's understood by the current code.

Comment on lines +56 to +60
The mapping is done by manipulating the object name. It is known that
pre-Valkey 9 object names cannot contain a valid hash-tag. This provides the
necessary information to distinguish between an object name and an encoded
<db_num, object_name> pair. Objects with a db_num of 0 are stored the same
way as pre-Valkey 9 objects, providing backward compatibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the encoding version added - could we just add a new field to the IndexSchema proto (e.g. clustered_db_num)? Old versions would silently drop it, but newer versions would support it.

This way, db_num can be removed from the metadata manager entirely (not encoded in the name and not added as a middle layer of the table).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would disallow two indexes in different db's to have the same name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying problem is that MetadataManager has a two-level naming scheme and now we need a three-level naming scheme. Plus, this stuff is preserved in the RDB file, so there's a backward compatibility issue. We can certainly move to a three-level scheme with conversion from old code. But that's a lot of changes to MetadataManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another scheme would be to reuse the top-level naming for each DB. That's a change in the philosophy of MetadataManager as it would require dynamic registration of the type level types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah that makes sense, was missing that part.

If I were redesigning it - the metadata entry would be indexed by a UUID instead of just the index name. The UUID could be computed by the original creator.

Is there some way we could move to a UUID based scheme now? I see two ways:

  1. Leave the existing metadata type namespace, and just change the schema name to be UUID indexed:

If we assume the new UUIDs are unique (they should be, or at least have an astronomically low chance of collision) then they should be able to merge cleanly with the existing IDs. We would just make the "ID" for metadata manager work like: DB!=0 -> UUID, DB==0 -> index name

Reading the code, the only place the ID is used in the Schema Manager is we use this to delete the existing index before processing the newer version of the index. So it shouldn't cause a crash. If you do create an index on DB!=0 when the cluster has nodes on the older version of Valkey-Search, it should work okay actually. If you later went to delete it, it would skip the delete since it would not find the index by the ID (which is now a UUID), which I guess is okay since it isn't even accessible on that node (it is just wasted resources). If you were to recreate it - it would log an error and fail to process - which I also think is okay since it is inaccessible on that node. Once you upgrade to the latest version, it should be able to handle the metadata (maybe we need to add some MetadataManager <-> SchemaManager reconciliation on load)

  1. Add a new metadata type namespace for UUID-indexed:

This would separate the two indexing schemes into two high level types, the old one, which is used for DB=0, and the new one, which is used for DB!=0.

Nodes on the old versions would just not apply the metadata for unknown types. So this works quite cleanly. Once they upgrade, ideally they would then apply the metadata


What do you think? 1 is less clean on the older version nodes, but this already feels like an edge case. As long as we can get it to not crash and it would eventually recover when fully upgraded, I am okay with some odd behaviors on the old nodes. But 2 should be quite clean, I just don't know if we want to support this two-type setup long term (it is complexity that we later need to clean up)

Copy link
Member Author

@allenss-amazon allenss-amazon Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, the MetadataManager takes care of resolving consistency issues. In order to support UUID naming, it would have to have some way to determine if a newly received UUID conflicts with some existing UUID. That would require that registered types have some mechanism that would assist this. In other words a client would be queried with a new protobuf and would have to return some UUID that conflicted with it. That seems worse to me, the value of MetadataManager is that he understands the objects and performs collision resolution.

Here, the problem is fundamentally naming. If we confine the discussion to one registered type, then the current system has a flat namespace. I think you either redesign MetadataManager to have a more hierarchical namespace OR you encode the hierarchy into the existing flat namespace. I've chosen the second path -- because I believe you were on the right path before, but the addition of the dbnum, was just unexpected. (BTW, we'll face exactly the same issues when we implement aliasing -- which I plan to do soon).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fundamentally about deriving some "primary key" from the index metadata that we can use to uniquely identify an index. It seems like a fundamental inflexibility that the primary key cannot be updated. That's fine as long as we are sure it will never really update. If dbnum is some once-in-a-project problem, then I am okay with what you have proposed.

But - if we need to add another field to the primary key - it seems like we won't be able to take the same path out again, right?

How about this: could we use a proto for the primary key?

message IndexMetadataKey {
     string name = 1;
     int db_num = 2;
     ...
}

So we would then serialize this and use this as the key. If we ever add a new field to the key, it would be cross version compatible as long as the default value for that field is handled. It is a similar option to UUID, but it guarantees two semantically identically indexes collide and are reconciled by the metadata manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Support multi-db in CME

2 participants